Statistical and computational tradeoffs in biclustering

نویسندگان

  • Sivaraman Balakrishnan
  • Mladen Kolar
  • Alessandro Rinaldo
  • Aarti Singh
  • Larry Wasserman
چکیده

We consider the problem of identifying a small sub-matrix of activation in a large noisy matrix. We establish the minimax rate for the problem by showing tight (up to constants) upper and lower bounds on the signal strength needed to identify the sub-matrix. We consider several natural computationally tractable procedures and show that under most parameter scalings they are unable to identify the sub-matrix at the minimax signal strength. While we are unable to directly establish the computational hardness of the problem at the minimax signal strength we discuss connections to some known NP-hard problems and their approximation algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gene co-expression networks via biclustering Differential gene co-expression networks via Bayesian biclustering models

Identifying latent structure in large data matrices is essential for exploring biological processes. Here, we consider recovering gene co-expression networks from gene expression data, where each network encodes relationships between genes that are locally co-regulated by shared biological mechanisms. To do this, we develop a Bayesian statistical model for biclustering to infer subsets of co-re...

متن کامل

Context Specific and Differential Gene Co-expression Networks via Bayesian Biclustering

Identifying latent structure in high-dimensional genomic data is essential for exploring biological processes. Here, we consider recovering gene co-expression networks from gene expression data, where each network encodes relationships between genes that are co-regulated by shared biological mechanisms. To do this, we develop a Bayesian statistical model for biclustering to infer subsets of co-...

متن کامل

Differential gene co-expression networks via Bayesian biclustering models

Identifying latent structure in large data matrices is essential for exploring biological processes. Here, we consider recovering gene co-expression networks from gene expression data, where each network encodes relationships between genes that are locally co-regulated by shared biological mechanisms. To do this, we develop a Bayesian statistical model for biclustering to infer subsets of co-re...

متن کامل

Discovering Relevance-Dependent Bicluster Structure from Relational Data

In this paper, we propose a statistical model for relevance-dependent biclustering to analyze relational data. The proposed model factorizes relational data into bicluster structure with two features: (1) each object in a cluster has a relevance value, which indicates how strongly the object relates to the cluster and (2) all clusters are related to at least one dense block. These features simp...

متن کامل

Applying Biclustering to understand the molecular basis of phenotypic diversity

High-throughput techniques, such as DNA microarrays, that are used in gene expression measurements offer a unique and global insight into the molecular mechanisms of a living cell. Computational resources are fundamental in order to extract biological interpretable information and deal with the big amount of the data extracted from these techniques. Statistical analysis of microarray data is a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011